Last Update: 2025/3/26
Qwen Audio Transcription API
The Qwen Audio Transcription API allows you to convert audio into text using OpenAI's SDK. This document provides an overview of the API endpoints, request parameters, and response structure.
Endpoint
POST https://platform.llmprovider.ai/v1/audio/transcriptions
Request Headers
Header | Value |
---|---|
Authorization | Bearer YOUR_API_KEY |
Content-Type | multipart/form-data |
Request Body
Parameter | Type | Description |
---|---|---|
file | file | The audio file object (not file name) to transcribe, in one of these formats: flac , mp3 , mp4 , mpeg , mpga , m4a , ogg , wav , or webm . file maxsize <= 20M |
model | string | ID of the model to use (e.g., paraformer-v2 ). |
prompt | string | (Optional) Text to guide the model's style or continue a previous audio segment. |
response_format | string | (Optional) The format of the transcript output (json , text , srt , verbose_json , or vtt ). Default is json . |
temperature | number | (Optional) The sampling temperature, between 0 and 1. Default is 0. |
language | string | (Optional) The language of the input audio (e.g., en , es , fr ). |
timestamp_granularities[] | array | (Optional) The timestamp granularities to populate for this transcription. |
Response Body
The transcription object
or a verbose transcription object
.
The transcription object(JSON)
Parameter | Type | Description |
---|---|---|
text | string | The transcribed text. |
{
"text": "Hello, this is the transcribed text from the audio file."
}
The transcription object (Verbose JSON)
Parameter | Type | Description |
---|---|---|
task | string | The task performed by the model. |
language | string | The language of the input audio. |
duration | number | The duration of the audio in seconds. |
segments | array | Segments of the transcribed text and their corresponding details. |
text | string | The transcribed text. |
words | array | Extracted words and their corresponding timestamps. |
{
"task": "transcribe",
"language": "en",
"duration": 2.95,
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 2.95,
"text": "Hello, this is the transcribed text from the audio file.",
"tokens": [
50364,
2425,
11,
359,
307,
1161,
1123,
422,
264,
1467,
1780
],
"temperature": 0.0,
"avg_logprob": -0.458,
"compression_ratio": 0.688,
"no_speech_prob": 0.0192
}
],
"text": "Hello, this is the transcribed text from the audio file."
}
Example Request
- Shell
- nodejs
- python
curl -X POST https://platform.llmprovider.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $YOUR_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@audio.mp3" \
-F model="paraformer-v2"
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
const formData = new FormData();
formData.append('file', fs.createReadStream('audio.mp3'));
formData.append('model', 'paraformer-v2');
axios.post('https://platform.llmprovider.ai/v1/audio/transcriptions', formData, {
headers: {
'Authorization': `Bearer ${YOUR_API_KEY}`,
...formData.getHeaders()
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error('Error:', error);
});
import requests
audio_file = open("audio.mp3", "rb")
files = {
"file": audio_file
}
headers = {
"Authorization": f"Bearer {YOUR_API_KEY}"
}
response = requests.post(
"https://platform.llmprovider.ai/v1/audio/transcriptions",
headers=headers,
files=files,
data={
"model": "paraformer-v2"
}
)
print(response.json())
For any questions or further assistance, please contact us at [email protected].